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Abstract. In phylogenetic inference one is interested in obtaining sam- 
ples from the posterior distribution over the tree space on the basis of 
some observed DNA sequence data. The challenge is to obtain samples 
from this target distribution without any knowledge of the normalizing 
constant. One of the simplest sampling methods is the rejection sampler 
due to von Neumann. Here we introduce an auto-validating version of 
the rejection sampler, via interval analysis, to rigorously draw samples 
from posterior distributions, based on homologous primate mitochondrial 
DNA, over small phylogenetic tree spaces. 

1 INTRODUCTION 

Obtaining samples from a density p{9) = p*{9)/Np, where 9 G @ and is a 
compact Euclidean subset, i.e., & C M", without any knowledge of the nor- 
malizing constant Np = jQp*{9)d9, is a basic problem in statistical inference. 
The usual Monte Carlo methods via conventional floating-point arithmetic are 
typically non-rigorous. We will concentrate on the rejection sampler due to von 
Neumann [1] and its rigorous extension for application in phylogenetics. The 
standard approaches to sampling from the posterior over phylogenies rely on 
Markov chain Monte Carlo (MCMC) methods. Despite their asymptotic valid- 
ity, it is nontrivial to guarantee that an MCMC algorithm has converged to 
stationarity [2], and thus MCMC convergence diagnostics on phylogenetic tree 
spaces are heuristic [3]. Thus, until now, no rigorous methodology has existed for 
perfectly sampling from the posterior distribution over phylogenetic tree spaces, 
even for 3 or 4 taxa. Here, we solve this rigorous posterior sampling problem 
over small phylogenetic tree spaces. 

After a brief introduction to the rejection sampler (RS) in Sect. 2, an interval 
version of this sampler is formalized in Sect. 3. This sampler is referred to as 
the Moore rejection sampler (MRS) in honor of Ramon E. Moore who was one 
of the influential founders of interval analysis [4]. In Sect. 4, we rigorously draw 
samples from the posterior over small tree spaces. We conclude in Sect. 5. Section 
7 summarizes our notation and gives a brief introduction to interval analysis, 
a prerequisite to understanding MRS. In Sect. 8, Lemma 1 shows that MRS 
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produces independent samples from the desired target density and Lemma 2 
describes the asymptotics of the acceptance probability for a refining family of 
MRSs. Unlike many conventional samplers, each sample produced by MRS is 
equivalent to a computer- assisted proof that it is drawn from the desired target, 
up to the pseudo-randomness of the underlying, deterministic, pseudo-random 
number generator. An open source C++ class library for MRS is publicly available 
from www . stats . ox . ac . uk/~sainudii/ codes . 

2 Rejection Sampler (RS) 

Rejection sampling [1] is a Monte Carlo method to draw independent samples 
from a target probability distribution p{9) = p*{6)/Np, where 9 £ & C K". 
Typically the target p is any density that is absolutely continuous with respect 
to the Lebesgue measure. In most cases of interest we can compute the target 
shape p*{9) for any 9 £ @, but the normalizing constant Np is unknown. The 
von Neumann RS can produce samples from p according to Algorithm 1 when 
provided with (i) a proposal density q{9) = q*{9)/Nq from which independent 
samples can be drawn, Nq = J^q*{9)d9 is known, and q*{9) is computable for 
any 9 £ & and (ii) a constant c defining the envelope function fq{9) = cq*{9), 
such that, 

fq{9)^cq*{9)>p*{9),y9£& . (1) 



Algorithm 1 von Neumann RS 

input: (1) a target shape p*, (2) a proposal density q, (3) an envelope function /, 

and (4) an integer TRIALSmax 

output: a sample from U distributed according to p 

initialize: TRIALS <= 0, SUCCESS <= false 

repeat 

DRAW T q {draw a sample from the random variable T with distribution 5} 
DRAW H ~ Uniform[G, fq{T)], where fq{T) > p'iT) 
if H <p*{T) then 

U ^T, SUCCESS <= true 
end if 

TRIALS <= TRIALS + 1 
until TRIALS < TRIALSmax or SUCCESS = true 



U generated by the above algorithm is distributed according to p [5] . Observe 
that the probability that a point proposed according to q gets accepted as 
an independent sample from p through the envelope function fq is the ratio of 
the integrals 

_ N.p ^ j&P*{0)d9 

and the probability distribution over the number of samples from q to obtain 
one sample from p is geometrically distributed with mean 1 / A^ [5] . 
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3 Moore Rejection Sampler (MRS) 

Moore rejection sampler (MRS) is an auto- validating rejection sampler (RS). 
It can produce independent samples from any target shape p* that has a well- 
defined natural interval extension P* (Definition 6) over a compact domain 0. 
MRS is said to be auto-validating because it automatically obtains a proposal q 
that is easy to simulate from, and an envelope fq that is guaranteed to satisfy the 
envelope condition (1). In summary, the defining characteristics and notations 
of MRS are: 

Compact domain = [9, 9] 

Target shape p*{9):&^R 

Target integral Np = Je,p*{9) dO 

Target density p{9) = ^ : ^ K 

Interval extension of p* P*{e) : I© ^ M 

Proposal shape '7*(^^) : ^ M 

Proposal integral Nq = /q q* (9) d9 

Proposal density q{9) = : ^ R 

Envelope function fq{9) = cq*{9) 

Envelope integral Nf^ = fq{9) d9 = cNg 
Acceptance probability = 

Partition of 2:= {0'^^\0^^\ ....O^™ }. 

If p* G £, the class of elementary functions (Definition 8), its natural interval 
extension P* is well-defined on and T = {0^^\O^^\ ...,0^™ } be a finite 
partition of 0, then by Theorem 4 we can enclose p*(6'''-'), i.e., the range of p* 
over the i-th element of T, with the interval extension P* of p* . 



P 



^(e«) c P*(0W) ^ [P*(6)W),P*(e(^))], Vi e {1,2,...,|T|} . (2) 



For the given partition T we can construct a partition-specific proposal q~ {9) as 
a normalized simple function over 0, 

m 

q'^i9) ^ {Nq.y J2p\e^^^)l^e e oi-n , (3) 

i=l 

with the normalizing constant Nqi = (^d{0^''^) ■ P*{0^^^)j , where, ^(6*) = 

d{[9_, 9]) = 9 — 9_ is the diameter of 0. The next ingredient /^i {9) for our rejection 
sampler can simply be 

_ 

/,.(0)=5]P*(e«)l{e,ew} . (4) 



The necessary envelope condition (1) is satisfied by fq^{9) because of (2). Now, 
we have all the ingredients to perform a more efficient partition-specific Moore 
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rejection sampling. Lemma 1 shows that if the target shape p* has a well-defined 
natural interval extension P*, and if U is generated according to Algorithm 1, 
and if the proposal density q~ (9) and the envelope function /^r (6) are given by 
(3) and (4), respectively, then U is distributed according to the target p. Note 
that the above arguments as well as those in the proof of Lemma 1 naturally 
extend when C K." for rt > 1. In the multivariate case, 0<-^^ e IR" (Definit ion 
4) is a box. Thus, we naturally replace the diameter of an interval by the volume 
of a box = Y[k=i d'iOl) . The envelopes and proposals are now simple 

functions over a partition of the domain into boxes. Analogous to the univariate 
case, the accepted samples are uniformly distributed in the region S C E"+^ 
'under' p* and 'over' 0. Hence their density is p [5]. 

Next we bound the acceptance probability Ay ^ ^ A?^ for this sampler. Due 
to the linearity of the integral operator and (2), 

Np^J^P*{e)de 

gEE'i (d(eW)-p*(0W)) 

Therefore, 

If p* G €£, the Lipschitz class of elementary functions (Definition 10), 
then we might expect the enclosure of Np to be proportional to the mesh 
w = maxi£{i J} c?(0'-'^) of the partition T. Lemma 2 shows that if p* € and 
iiw is a uniform partition of into W intervals, then the acceptance probability 
■^Uw ~ "'^ ~ 0{1/W). Thus, the acceptance probability approaches 1 at a rate 
that is no slower than linearly with the mesh. We can gain geometric insight 
into the sampler from an example. The dashed lines of a given shade, depicting 
a simple function in Fig. 6, is a partition-specific envelope function (4) for the 
target shape s*{x) = — ^ ^ sin (^^^j^) over the domain = [—10, 6] and 

its normalization gives the corresponding proposal function (3). As the refine- 
ment of proceeds through uniform bisections, the partition size increases as 2\ 
i = 1,2,3,4. Each of the corresponding envelope functions in increasing shades 
of gray can be used to draw auto- validated samples from the target s{x) over 0. 
Note how the acceptance probability (ratio of the area below the target shape 
to that below the envelope) increases with refinement. 

We studied the efficiency of uniform partitions for their mathematical tractabil- 
ity. In practice, we may further increase the acceptance probability for a given 
partition size by adaptively partitioning 0. In our context, adaptive means the 
possible exploitation of any current information about the target. We can refine 
the current partition Tq and obtain a finer partition Tq/ with an additional box 
by bisecting a box 0'*-' S la along the side with the maximal diameter. There 



Nr. 



5 



arc several ways to choose a e la for bisection. When O^'^ G M" has vol- 
ume u(6''*^), an optimal choice for 0**^ = argmaxgi(,)g^^ (u(6''*^) • d(P*(6''*^)). 
Under this partitioning scheme, we employ a priority queue to conduct sequential 
refinements of 0. This approach avoids the exhaustive argmax computations to 
obtain the Q*-*-* for bisection at each refinement step. Once we have any parti- 
tion T of 0, we can efficiently sample 9 ^ given by (3) in two steps. First we 
sample a box 0^'^ £ T according to the discrete distribution t(6'^*-'), 

and then we choose a € uniformly at random. Sampling from large discrete 
distributions (with million states or more) can be made faster by preprocessing 
the probabilities and saving the result in some convenient lookup table. This 
basic idea [6] allows samples to be drawn rapidly. We employ a more efficient 
preprocessing strategy [7] that allows samples to be drawn in constant time 
even for very large discrete distributions as implemented in the GNU Scientific 
Library [8]. Thus, by means of priority queues and lookup tables we can efficiently 
manage our adaptive partitioning of the domain for envelope construction, and 
rapidly draw samples from the proposal distribution. We used the Mersenne 
Twister random number generator [9] in this paper. Our sampler class builds on 
C-XSC 2.0, a C++ class library for extended scientific computing using interval 
methods [10]. All computations were done on a 2.8 GHz Pentium IV machine 
with 1GB RAM. Having given theoretical and practical considerations to our 
Moore rejection sampler, we are ready to draw samples from various targets. 



4 Auto-validating Independent Posterior Samples from 
Triplets and Quartets 

Inferring the ancestral relationship among a set of species based on their DNA 
sequences is a basic problem in pliylogenetics [11]. One can obtain the likelihood 
of a particular phylogcnetic tree that relates the species of interest by superim- 
posing a simple Markov model of DNA substitution due to Jukes and Cantor 
[12] on that tree. The length of an edge (branch length) connecting two nodes 
(species) in the tree represents the amount of evolutionary time (divergence) 
between the two species. The likelihood function over trees obtained through 
a post-order traversal (e.g. [13]) has a natural interval extension over boxes of 
trees [14]. This allows us to draw samples from the posterior distribution over a 
compact box specified by our prior distribution on the tree space using our MRS. 
We assume a uniform prior over the possible unrooted topologies and a uniform 
product prior over all branch lengths in the range [10~^°, 10]. We consider two 
mitochondrial DNA data sets. 



4.1 Chimpanzee, Gorilla, Orangutan and Gibbon 

Our posterior distribution is based on the data from an 895 bp long homolo- 
gous segment of mitochondrial DNA from chimpanzee, gorilla, orangutan, and 
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gibbon, containing the genes for three transfer RNAs and parts of two proteins 
[15]. Under the assumption of independence across sites, the sufficient statistics 
only comprise of the distinct site patterns and their counts. The data for chim- 
panzee, gorilla and orangutan can be summarized by the following 29 distinct 
site patterns and counts: 

site : 11111111112222222222 

pattern : 12345678901234567890123456789 



chimpanzee : agctatcacccatctgccgtactaagcgt 
gorilla : agctgttatcaacacgcaaaatccggtat 
orangutan : agctaccgttcccataataataaagcgca 



site :27211311921823187192421211213 
pattern :31263168 2 

counts : 2 9 8 

In the above data set, the first column (l.aaa.232) expresses that there are 
232 site patterns with nucleotide 'a' in all three species, and the last col- 
umn (29.tta.3) expresses that there are 3 site patterns with nucleotide 't' in 
chimpanzee and gorilla, and nucleotide 'a' in orangutan. The data for all four 
primates can be summarized by 61 distinct site patterns as parsed in [14]. 10000 
independent samples were drawn in 942 CPU seconds from the posterior dis- 
tribution over Jukes-Cantor triplets, i.e. unrooted trees with three edges corre- 
sponding to the three primates emanating from their common ancestor. Figure 
1 shows these samples (blue dots) scattered about the verified global maximum 
likelihood estimate (MLE) of the triplet obtained in [14] and subsequently con- 
firmed algebraically in [16]. We also drew 10000 independent samples from the 
posterior based on the 198 tRNA-coding DNA sites (green dots in Fig. 1) as 
well as from that based on the remaining 697 protein-coding sites (red dots in 
Fig. 1). The former posterior samples, corresponding to the tRNA-coding sites, 
are more dispersed than the posterior samples based on the entire sequence. 
This is due to the smaller number of tRNA-coding sites making the posterior 
less concentrated. We were able to reject the null hypothesis of homogeneity 
between the posterior samples based on the tRNA-coding sites and that based 
on the protein-coding sites at the 10% significance level (P-value = 0.06 from 
a non-parametric bootstrap of Hotelling's trace statistic based on 100 random 
permutations of the sites). Any biological interpretation of this test must be 
done cautiously since the Jukes and Cantor model employed here forbids any 
transition:transversion bias that is reportedly relevant for this data [15]. 

[Fig. 1 about here.] 

We were able to draw samples from Jukes-Cantor quartets by adding the homol- 
ogous sequence of the Gibbon. Now, the problem is a more challenging because 
there are three distinct tree topologies in the unrooted, bifurcating, quartet tree 
space, and each of these topologies has five edges. Thus, the domain of quar- 
tets is a piecewise Euclidean space that arises from a fusion of 3 distinct five 
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dimensional orthants. Since the post-order traversals specifying the hkehhood 
function are topology-specific, we extended the likelihood over a compact box 
of quartets in a topology-specific manner. The computational time was about a 
day and a half to draw 10000 samples from the quartet target due to low accep- 
tance probability of the naive likelihood function based on the 61 distinct site 
patterns. All the samples had the topology which grouped Chimp and Gorilla 
together, i.e. ((Chimp, Gorilla), (Orangutan, Gibbon)). The samples were again 
scattered about the verified global MLE of the quartet [14]. The marginal triplet 
trees (gray dots) within the sampled quartets are depicted in Fig. 2. This quartet 
likelihood function has an elaborate DAG (Definition 9) with numerous opera- 
tions. When the data got compressed into sufficient statistics through algebraic 
statistical methods [17], the efficiency increased tremendously (e.g. for triplets 
the efficiency increases by a factor of 3.7). This is due to the number of leaf nodes 
in the target DAG, which encode the distinct site patterns of the observed data 
into the likelihood function, getting reduced from 29 to 5 for the triplet target 
and from 61 to 15 for the quartet target [17]. Poor sampler efficiency makes it 
currently impractical to sample from trees with five leaves and 15 topologies (see 
Sect. 5 for a discussion on improvements). However, one could use such triplets 
and quartets drawn from the posterior distribution to stochastically amalgamate 
and produce estimates of larger trees via fast amalgamating algorithms [18, 19], 
which may then be used to combat the slow mixing in MCMC methods [3] by 
providing a good set of initial trees. 

[Fig. 2 about here.] 
4.2 Neandertal, Human and Chimpanzee 

We used the whole mitochondrial genome shotgun sequence (gi|115069275) of a 
Neandertal fossil Vi-80, from Vindija cave, Croatia [20], and its homologous se- 
quence in a human (gi|13273200) and a chimpanzee (gi|1262390), as summarized 
by the 15 sufficient site patterns and their counts below, to conduct statistical 
inference about the human-neandcrtal divergence time. 
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We drew 10000 auto- validating independent samples from each of three poste- 
rior distributions; (1) over the space of unrooted triplets under the Jukes-Cantor 
model in 312 CPU seconds, (2) over the clocked and rooted triplets under a 
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Jukes-Cantor model in 375 CPU seconds and (3) over the clocked and rooted 
triplets under a more general mutational model due to Hasegawa, Kishino and 
Yano (HKY) [21] in 1.2 CPU hours. In the HKY model we used the empirical 
nucleotide frequencies from the data {ttt = 0.2588, ttc ~ 0.2571, tta = 0.2916, 
TTa = 0.1925) and a hominid-specific transition/transversion rate of 2.0. Unlike 
the Jukes-Cantor model, all 15 distinct site patterns are minimally sufficient un- 
der the HKY model and this is reflected in its longer CPU time. Both models 
gave similar posterior samples over rooted triplets, as shown in Fig. 3. 

[Fig. 3 about here.] 

We transformed the three posterior distributions over the triplet spaces; (1) 
unrooted Jukes-Cantor triplets that were rooted using the mid-point rooting 
method, (2) rooted Jukes-Cantor triplets and (3) rooted HKY triplets, respec- 
tively, into three posterior distributions over the human-neandertal divergence 
time relative to the human-chimp divergence time (Fig. 4). The correspond- 
ing posterior quantiles ({5%, 50% ,95%}) for the human-neandertal divergence 
times are {0.0643 ,0.125 ,0.214}, {0.0694 ,0.142 ,0.263} and {0.0682 ,0.143 
,0.268}, respectively. We constrained the neandertal lineage to be a fraction of 
the human lineage in branch length in order to estimate the age of the neander- 
tal fossil from the rooted HKY triplets. The posterior quantiles of the fossil date 
in units of human-chimp divergence is {0.00685 ,0.0666 ,0.195}. The estimate 
of 38, 310 years based on carbon-14 accelerator mass spectrometry [20] is within 
our [5%, 95%] posterior quantile interval for the fossil date, provided the human- 
chimp divergence estimates ranges in [196103, 5.6 x 10% Thus, reasonable bounds 
for the human-chimp divergence are 4 x 10^ and 5.6 x 10^ years. Based on these 
calendar year estimates, we transformed the posterior quantiles of the human- 
neandertal divergence times from the rooted HKY triplets into {272680 , 571124 
,1073375} and {381752 ,799574 ,1502724}, respectively. Our [5%, 95%] poste- 
rior intervals contain the interval estimate of [461000, 825000] years reported in 
[20]. However, our confidence intervals are from perfectly independent samples 
from the posterior and account for the finite number of neandertal sites that 
were successfully sequenced, unlike those obtained on the basis of a bootstrap of 
site patterns [22] or heuristic MCMC [2]. Unfortunately, our Imman-neandertal 
divergence estimates are overestimates as they ignore the non-negligible time to 
coalescence of the human and neandertal homologs within the human-neandertal 
ancestral population. Improvements to our estimates based on the other 310 hu- 
man and 4 chimpanzee homologs reported in [20] may be possible with more 
sophisticated models of populations within a phylogeny and need further inves- 
tigation. 

[Fig. 4 about here.] 

5 Conclusion 

Interval methods provide for a rigorous sampling from posterior target densities 
over small phylogenetic tree spaces. When one substitutes conventional floating- 
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point arithmetic for real arithmetic in a computer and uses discrete lattices to 
construct the envelope and/or proposal, it is generally not possible to guaran- 
tee the envelope property, and thereby ensure that samples are drawn from the 
desired target density, except in special cases [23]. Thus, the construction of 
the Moore rejection sampler through interval methods, that enclose the target 
shape over the entire real continuum in any box of the domain with machine- 
representable bounds, in a manner that rigorously accounts for all sources of 
numerical errors (see [24] for a discussion on error control), naturally guarantees 
that the Moore rejection samples are independent draws from the desired tar- 
get. Moreover, the target is allowed to be multivariate and/or non-log-concave 
with possibly 'pathological' behavior, as long as it has a well-defined interval 
extension. 

The efficiency of MRS is not immune to the curse of dimensionality and tar- 
get DAG complexity. When the DAG for the likelihood gets large, its natural 
interval extension can have terrible over-enclosures of the true range, which in 
turn forces the adaptive refinement of the domain to be extremely fine for ef- 
ficient envelope construction. Thus, a naive application of interval methods to 
targets with large DAGs can be terribly inefficient. In such cases, sampler effi- 
ciency rather than rigor is the issue. Thus, one may fail to obtain samples in a 
reasonable time, rather than (as may happen with non-rigorous methods) pro- 
duce samples from some unknown and undesircd target. There are several ways 
in which efficiency can be improved for such cases. First, the particular structure 
of the target DAG should be exploited to avoid any redundant computations. 
For example, algebraic statistical methods can be used to find sufficient statistics 
to dissolve symmetries in the DAG as done in Sect. 4. Second, we can further 
improve efficiency by limiting ourselves to differentiable targets in C". Tighter 
enclosures of the range p* (6)W) with P*(6'W) can come from the enclosures of 
Taylor expansions of p* around the midpoint m{0^^^) through interval-extended 
automatic differentiation (see [24] ) that can then yield tighter estimates of the in- 
tegral enclosures [25]. Third, we can employ pre-processing to improve efficiency. 
For example, we can pre-enclose the range of a possibly rescaled p* over a par- 
tition of the domain and then obtain the enclosure of P* over some arbitrary O 
through a combination of hash access and hull operations on the pre-enclosures. 
Such a pre-enclosing technique reduces not only the overcstimation of target 
shapes with large DAGs but also the computational cost incurred while per- 
forming interval operations with processors that are optimized for floating-point 
arithmetic. Fourth, efficiency at the possible cost of rigor can also be gained (up 
to 30% ) by foregoing directed rounding during envelope construction. 
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7 Appendix A 

Definition 1. Let X ^ [x,x] be an interval in IM. ^ {[x,x] : x<x,x,x & R} 

Definition 2 (Interval arithmetic). // the binary operator * is one of the 

elementary arithmetic operations {+,—,-,/}, then we define an arithmetic on 
operands in IM by 

Xi^Y = {xi.y:xeX,yeY} 
with the exception that X/Y is undefined ifOGY. 
Theorem 1. Arithmetic on the pair X,Y G IM is given by: 



Proof (cf. [25]): Since any real arithmetic operation x*y, where * £ {+, — , •, /} 
and cc, ?/ G M, is a continuous function x-ky = *{x, y) : M x R — > M, except when 
y = under / operation. Since X and Y are simply connected compact intervals, 
so is their product X xY. On such a domain X xY, the continuity of -k{x, y) (ex- 
cept when * = / and G Y) ensures the attainment of a minimum, a maximum 
and all intermediate values. Therefore, with the exception of the case when * = / 
and £ y, the range X-kY has an interval form [min [x-ky], max {x ★ y)] , where 
the min and max are taken over all pairs [x, y) G X xY . Fortunately, we do not 
have to evaluate x-ky over every (x, y) G X xY to find the global min and global 
max of k:{x,y) over X x Y, because the monotonicity of the ★(a;,y*) in terms 
of a; G X for any fixed y* S F implies that the extremal values are attained on 
the boundary of X xY, i.e., the set {x,y,x, and y}. Thus the theorem can be 
verified by examining the finitely many boundary cases. □ 

An extremely useful property of interval arithmetic that is a direct conse- 
quence of Definition 2 is summarized by the following theorem. 

Theorem 2 (Fundamental property of interval arithmetic). If X C X' 

and Y C_Y' and -k G {+, — , •, /}, then 

where we require that Q ^Y' when -k = j . 
Proof: 

XkY = {x-ky: xCiX,y(^Y}^ {x-ky: xeX' ,y(iY'} = X' -k.Y' U 

Note that an immediate implication of Theorem 2 is that when X ~ x and 
Y = y are thin intervals (real numbers x and y), then X' k Y' will contain the 
result of the real arithmetic operation x-ky. 



X + Y 
X-Y 
X ■ Y 
X/Y 



[x + y,x + y] 
[x-y,x-y\ 

[min{xy, xy, xy, xy}, max{xy, xy, xy, xy}] 
X ■ [l/y, l/y], provided, ^Y. 
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Definition 3 (Range). Consider a real-valued function f : D ^ M. where the 
domain D C M". The range of f over any E C_ D is represented by Rng(f;E) 
and defined to be the set 

Rng[f-E)^{f{x):xeE} 

However, when the range of f over any X G IM" such that X C D is of interest, 
we will use the short-hand f{X) for Rng(f; X) . 

Definition 4 (Interval extension of subsets of R"). For any Euclidean 
subset © C R" let us denote its interval extension by I© and define it to be the 
set 

m = {x e m" : x,xe ©} 

We refer the the kth interval of interval vector or box X e IM" by Xk- 

Definition 5 (Inclusion isotony). An box-valued map F : D ^ IK™, where 
D G IM" , is inclusion isotonic if it satisfies the property 

\fX CY CD =^ F{X) C F{Y). 

Definition 6 (The natural interval extension). Consider a real-valued func- 
tion f : D ^ R given by a formula, where the domain D G IM" . // real constants, 
variables, and operations in f are replaced by their interval counterparts, then 
one obtains 

F{X) : ID IM. 

F is known as the natural interval extension of f. This extension is well-defined 
if we do not run into division by zero. 

Theorem 3 (Inclusion isotony of rational functions). Consider the ra- 
tional function f{x) — p{x)/q{x), where p and q are polynomials. Let F be its 
natural interval extension such that F{Y) is well-defined for some y G IM and 
let X, X' G IM. Then we have 

[i) Inclusion isotony: C X' C F =^ F{X) C F{X') , and 
(m) Range enclosure: VX C y =^ Rng{f-X) = f{X) C F{X). 

Proof (cf. [25]): Since F{Y) is well-defined, we will not run into division by 
zero, and therefore (i) follows from the repeated invocation of Theorem 2. We 
can prove (ii) by contradiction. Suppose Rng{f;X) <j- F{X'). Then there exists 
X ^ X, such that f(x) G Rng{f;X) but /(x) ^ F{X). This in turn implies 
that /(x) = F{[x,x]) ^ F{X), which contradicts (i). Therefore, our supposition 
cannot be true and we have proved (ii) Rng{f\X) C F{X). □ 

Definition 7 (Standard functions). Piece-wise monotone functions, includ- 
ing exponential, logarithm, rational power, absolute value, and trigonometric 
functions, constitute the set of standard functions 

6 = { a^, logf^{x), x^^'^ , sin(2;), cos(a;), tan(a;), sinh(x), . . . , arcsin(x), . . . }. 
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Such functions have wcU-dcfincd interval extensions that satisfy inclusion isotony 
and exact range enclosure, i.e., Rng{f;X) — f{X) ~ F{X). Consider the fol- 
lowing definitions for the interval extensions for some monotone functions in 6 
with X e IM, 

exp(X) = [exp(x), exp(s)] 
arctan(X) = [arctan(x), arctan(3;)] 

V(X) = \/M] if < X 

log(X) = [log(£),log(x)] ifO<^ 

and a piece-wise monotone function in S with Z+ and Z~ representing the set 
of positive and negative integers, respectively. 

: if ?i e Z+ is odd, 
[(X)",|Xr] : if ?i e Z+ is even, 
[1,1] :ifn = 0, 

1/x]-" :if?ieZ-;O^X 

Definition 8 (Elementary functions). A real-valued function that can be ex- 
pressed as a finite combination of constants, variables, arithmetic operations, 
standard functions and compositions is called an elementary function. The set 
of all such elementary functions is referred to as C;. 

Definition 9 (Directed acyclic graph (DAG) of a function). One can 

think of the process by which an elementary function f is computed as the result 
of a sequence of recursive operations with the subexpressions fi of f where, i = 
1, . . . ,n < oo. This involves the evaluation of the subexpression fi at node i with 
operands Si- , Si^ from the sub-terminal nodes of i given by the directed acyclic 
graph (DAG) for f 

{fi{si^, Si^) : if node i has 2 sub-terminal nodes Si^, Si^ 
fi{si-^) : if node i has 1 sub-terminal node Si^ (6) 

I{si) : if node i is a leaf or terminal node, I(x) ~ x. 

The leaf or terminal node of the DA G is a constant or a variable and thus the 
fi for a leaf i is set equal to the respective constant or variable. The recursion 
starts at the leaves and terminates at the root of the DAG. The DAG for an 
elementary f with n sub- expressions /i, /2, . . . , /n is : 

{Qfi\U 0/n = /(^), (7) 

where each Qfi is computed according to (6). 

For example the elementary function x ■ sin((a; — 3)/3) can be obtained from 
the terminus O/e of the recursion {0/i}f^i on the DAG for / as shown in Fig. 5. 

[Fig. 5 about here.] 
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It would be convenient if guaranteed enclosures of the range f{X) of an elemen- 
tary / can be obtained by its natural interval extension F{X). We show that 
inclusion isotony does indeed hold for F, i.e. if X CY, then F{X) C F{Y), and 
in particular, the inclusion property that x £ X => fix) S F{X) does hold. 

Theorem 4 (The fundamental theorem of interval analysis). Consider 
any elementary function / G (£. Let F : Y ^IM be its natural interval extension 
such that F{Y) is well-defined for some Y E IR. and let X, X' G IK. Then we 
have 

{i) Inclusion isotony: \fX CX' CY =^ F{X) C F{X') , and 
(m) Range enclosure: WX CY =^ Rng{f;X) = f{X) C F{X). 

Proof (cf. [25]): Any elementary function / G € is defined by the recursion 7 
on its sub-expressions fi where i G {1, . . . , n} according to its DAG. If f{x) = 
p{x)/q{x) is a rational function, then the theorem already holds by Theorem 
3, and if / G 6 then the theorem holds because the range enclosure is exact 
for standard functions. Thus it suffices to show that if the theorem holds for 
/ij /2 € £, then the theorem also holds for /i * /2, where ★ G — , /, ■, o}. By 
o we mean the composition operator. Since the proof is analogous for all five 
operators, we only focus on the o operator. Since F is well-defined on its domain 
y, neither the real-valued / nor any of its sub-expressions fi have singularities 
in its respective domain Yi induced by Y. In particular /2 is continuous on any 
X2 and X2 such that X2 C X2 C Y2 implying the compactness of ^2(^2) = W2 
and F2{X2) = W2, respectively. By our assumption that Fi and F2 are inclusion 
isotonic we have that W2 C W2 and also that 

Fi o ^2(^2) = FiiF2iX2)) - ^1(1^2) C Fi(W^) = Fi(F2(X^)) = Fi o ^2(^2) 

The range enclosure is a consequence of inclusion isotony by an argument iden- 
tical to that given in the proof for Theorem 3. □ 

The fundamental implication of the above theorem is that it allows us to 
enclose the range of any elementary function and thereby produces an upper 
bound for the global maximum and a lower bound for the global minimum 
over any compact subset of the domain upon which the function is well-defined. 
We will see in the sequel that this is the work-horse of randomized enclosure 
algorithms that efficiently produce samples even from highly multi-modal target 
distributions. 

Unlike the natural interval extension of an / G 6 that produces exact range 
enclosures, the natural interval extension F{X) of an / G € often overestimates 
the range f{X), but can be shown under mild conditions to linearly approach the 
range as the maximal diameter of the box X goes to zero, i.e., i){F{X), f{X)) < 
a ■ doo{X) = maxi d{Xi) for some a > 0. This implies that a partition of X into 
smaller boxes {X<^^\--- gives better enclosures of f{X) through the 

union 

U" 1 F{X'-''^) as illustrated in Fig. 6. Next we make the above statements 
precise. 



[Fig. 6 about here.] 
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Definition 10. A function f : D ^ R is Lipschitz if there exists a Lipschitz 
constant K such that, for all x,y G D, we have \f{x) — fiy)\ < K\x ~ y\. 
We define €£ to be the set of elementary functions whose sub- expressions fi, 
i = 1, . . . ,n at the nodes of the corresponding DAGs are all Lipschitz. 

Theorem 5 (Range enclosure tightens Hnearly with mesh). Consider 
a function f : D ^ W with f E ^si- Let F be an inclusion isotonic interval 
extension of f such that F{X) is well-defined for some X g IM, X C_ I . Then 
there exists a positive real number K, depending on F and X, such that if X = 
UjLiXW, then 

k 

i?nff(/;X)C |jF(X«)Ci^(X) 

i=l 

and 

r < r{Rng{f; X)) + K max^ r(XW) 

Proof : The proof is given by an induction on the DAG for / similar to the 
proof of Theorem 4 (See [25]). 



8 Appendix B 

Here we will study the Moore rejection sampler (MRS) carefully. Lemma 1 shows 
that MRS indeed produces independent samples from the desired target and 
Lemma 2 describes the asymptotics of the acceptance probability as the partition 
of the domain is refined. 

Lemma 1. Suppose that the target shape p* has a well-defined natural interval 
extension P* . If U is generated according to Algorithm 1, and if the proposal 
density q^{d) and the envelope function fqi{9) are given by (3) and (4), respec- 
tively, then U is distributed according to the target p. 

Proof: From (3) and (4) observe that fq^{t) = q^{t)Nqi. Let us define the 
following two subsets of M^, 

Bq = {{t,h) ■.0<h< fgi{t)}, and Bp = {{t,h) : < h<p*{t)}. 

First let us agree that Algorithm 1 produces a pair (T, H) that is uniformly 
distributed on Bq. We can see this by letting k{t, h) denote the joint density of 
{T,H) and k{h\t) denote the conditional density of H given T = t. Then, 



q'^{t)k{h\t) if{t,h)eBq 
otherwise . 



k{t,h) 

Since we sample a uniform height h for a given t, 
k{h\t) 



ifgHt))-' = {q^it)Nq.r' if he [OJq.it)] 

otherwise. 
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Therefore, 

k{t,h) 



otherwise . 



Thus we have shown that the joint density of {T, H) is a uniformly distribution 
on Bq. The above relationship also makes geometric sense since the volume of Bq 
is exactly Nqi. Now, let {T* ,H*) be an accepted point, i.e., (T*, H*) £ Bp C Bq. 
Then, the uniform distribution of (T, H) on Bq implies the uniform distribution of 
{T*,H*) on Bp. Since the volume of Bp is Np, the p.d.f. of (T*, H*) is identically 
1/Np on Bp and elsewhere. Hence, the marginal p.d.f. of [/ = T* is 

w{u)=jf^''h/Npdh 
= l/Npjf^-hdh 

= yNp 1 dh, :■ p{u) = p*{u)lNp 

= p{u). □ 

Lemma 2. LetiLw be the uniform partition of @ = [0,9] into W intervals each 
of diameter w 

(0-9) 

0w = [O.+ {i- e + iw],i = l,...,W 

iiw ={e^^ ,i = i,...,w}. 



and let p* G €£ , then 
Proof 

Then by means of Theorem 5 



d{e'§)^o{i/w) =^ (,(p*(0(;)),p*(0«) ) = o(i/w^) 



Therefore 

(d(0«)-P*(0«)) =u;^P*([^+(z-lK ^ + 

i=l 1=1 

and we have 

<^E!Ii^*(0iy)) = WW^) =^ AP^^, = 1 - 0(1/VF) 

Therefore the lower bound for the acceptance probability A^^^ of MRS ap- 
proaches 1 no slower than linearly with the refinement of by iiw- Note that 
this should hold for a general nonuniform partition with w replaced by the 
mesh.n 
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Fie. 1. 10000 Moore reiection samples from the posterior distribution over the three 
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Fig. 2. 1000 Moore rejection samples (gray dots) Irom the posterior distribution over 
the unrooted quartet tree space of Chimpanzee (Ch.), Gorilla (Go.), Orangutan (Or.) 
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Nean. 



Fig. 3. 10000 Moore rejection samples each from the posterior distribution over the 
three branch lengtlis of the rooted phylogenetic tree space of Chimpanzee, Human and 
Neandertal under the Jukes-Cantor model (blue dots) and the HKY model (red dots) 




Fig. 4. Posterior distribution over the human-neandertal divergence time relative to 
the human-cliimp divergence time based on 10000 independent samples from the (1) 
Midpoint-rooted tree estimates of the unrooted triplets under the Jukes-Cantor model 
(light gray), (2) rooted triplets under the Jukes-Cantor model (dark gray), and (3) 
rooted triplets under HKY model (black) 
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Fig. 5. Recursive evaluation of tlie sub-expressions /i, . . . , /g on tlie DAG of tlie ele- 
mentary function j[x) — Q/e = x ■ sin((a; — 3)/3) 
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Fig. 6. Range enclosure of the interval extension of ~Yl\-i^^ ^in 

(Mi^) linearly 

tightens with the mesh. 



